Investigating the Change of Web Pages' Titles Over Time

نویسندگان

  • Martin Klein
  • Michael L. Nelson
چکیده

Inaccessible web pages are part of the browsing experience. The content of these pages however is often not completely lost but rather missing. Lexical signatures (LS) generated from the web pages’ textual content have been shown to be suitable as search engine queries when trying to discover a (missing) web page. Since LSs are expensive to generate, we investigate the potential of web pages’ titles as they are available at a lower cost. We present the results from studying the change of titles over time. We take titles from copies provided by the Internet Archive of randomly sampled web pages and show the frequency of change as well as the degree of change in terms of the Levenshtein score. We found very low frequencies of change and high Levenshtein scores indicating that titles, on average, change little from their original, first observed values (rooted comparison) and even less from the values of their previous observation (sliding).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the unofficial factors in Google ranking

This paper evaluates the effectiveness of some “unofficial” factors in Search Engine Optimisation. A summary of official Google guidelines is given followed by a review of “unofficial” ranking factors as reported by a number of experts in the field of Search Engine Optimisation”. These opinions vary and do not always agree. Experiments on keyword density, web page titles and the use of outbound...

متن کامل

تشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی

Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...

متن کامل

Using the Web Infrastructure for Real Time Recovery of Missing Web Pages

USING THE WEB INFRASTRUCTURE FOR REAL TIME RECOVERY OF MISSING WEB PAGES Martin Klein Old Dominion University, 2011 Director: Dr. Michael L. Nelson Given the dynamic nature of the World Wide Web, missing web pages, or “404 Page not Found” responses, are part of our web browsing experience. It is our intuition that information on the web is rarely completely lost, it is just missing. In whole or...

متن کامل

A New Hybrid Method for Web Pages Ranking in Search Engines

There are many algorithms for optimizing the search engine results, ranking takes place according to one or more parameters such as; Backward Links, Forward Links, Content, click through rate and etc. The quality and performance of these algorithms depend on the listed parameters. The ranking is one of the most important components of the search engine that represents the degree of the vitality...

متن کامل

Temporal Anchor Text as Proxy for Real User Queries

Web archives preserve the fast changing web. While we can archive the web pages, the popularity of queries in the past has usually not been preserved. Previous studies have observed the importance of anchor text for improving the quality of text search, and have shown that anchor text is similar to real user queries and documents titles. Other studies have shown that documents titles are simila...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/0907.3445  شماره 

صفحات  -

تاریخ انتشار 2009